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ABSTRACT 

The aim of this paper is to present a first evaluation of a dy- 
namic partition strategy associated to the recently proposed 
asynchronous distributed computation scheme based on the 
D-iteration approach. The D-iteration is a fluid diffusion 
point of view based iteration method to solve numerically 
linear equations. Using a simple static partition strategy, it 
has been shown that, when the computation is distributed 
over K virtual machines (PIDs) , the memory size to be han- 
dled by each virtual machine decreases linearly with K and 
the computation speed increases almost linearly with K with 
a slope becoming closer to one when the number N of linear 
equations to be solved increases. Here, we want to evaluate 
how further those results can be improved when a simple 
dynamic partition strategy is deployed and to show that the 
dynamic partition strategy allows one to control and equal- 
ize the computation load between PIDs without any deep 
analysis of the matrix or of the underlying graph structure. 

Categories and Subject Descriptors 

G.l.O [Mathematics of Computing]: Numerical Anal- 
ysis — Parallel algorithms; G.1.3 [Mathematics of Com- 
puting]: Numerical Analysis — Numerical Linear Algebra; 
C.2.4 [Computer Systems Organization]: Computer- 
Communication Networks — Distributed Systems 

General Terms 

Algorithms, Performance 

Keywords 

Distributed computation. Iteration, Fixed point. Eigenvec- 
tor. 

1. INTRODUCTION 

Solving efficiently a very large linear equation systems 
(and the related initial problems) is a very classical problem 
and challenge for the algorithm design. The complexity of 
the problem to solve numerically a very large linear systems 
may increase rapidly with the dimension of the vector space. 
There are many known approaches to solve such a class 
of problems: Gauss elimination, Jacobi iteration, Gauss- 
Seidel iteration, SOR (successive over-relaxation), Richard- 
son, Krylov, Gradient method, power iteration, QR algo- 
rithm etc [ri], [31], [4], [To], [21], [33]. And there are more 
specific approaches in more particular cases when the linear 
equations are associated to a sparse matrix (in particular. 



in the context of PageRank equation [26], [6], [7], [2]: power 
method [30] with adaptation [20] or extrapolation |12| . [21j . 
[8], iterative aggregation/disaggregation method [27], [18j . 
[29j , adaptive on-line method [2] , etc) . The case of the sym- 
metric and diagonally dominant (SDD) systems [9], [32], [24] 
is also a very interesting case that was deeply investigated. 
In parallel, there have been a lot of researches concerning 
the distributed computation of the linear equations [5], [19] . 
[28j . [23j . [22) . with a particular interest on asynchronous 
iteration scheme. 

The algorithm proposed here is a new solution for a class 
of problem we could call diagonally dominant (DD) systems 
based on the recent research results on the D-iteration. The 
D-iteration method was initially introduced in [17] to solve 
numerically the eigenvector of the PageRank equation (the 
eigenvector defining the score of the page importance). Its 
applicability in a general linear equation has been described 
in [16]. The distributed architecture based on this algo- 
rithm was first proposed in [13] and then evaluated through 
simulations in [14] when static partition strategies are ap- 
plied. It has been shown in [14] that, when the computation 
is distributed over K virtual machines (PIDs), the memory 
size to be handled by each virtual machine decreases linearly 
with K and the computation speed increases almost linearly 
with K with a slope becoming closer to one when the num- 
ber A'' of linear equations to be solved increases. However, 
those results were obtained under the assumption that the 
information diffusion cost can be neglected, in particular the 
computation cost of the fluid quantities to be diffused were 
neglected. Such an assumption is not realistic when K be- 
comes larger or more precisely when N/K becomes smaller. 

Refining and redesigning the algorithms that were pro- 
posed in |16l 1131 1141 115] , we propose here to revisit the re- 
sults in [14] and evaluate the benefit of a simple and natural 
dynamic partition strategy in order to control and equalize 
the work load of each virtual machine when parallel com- 
putation is used. Such a dynamic scheme may be also re- 
quired when we assume that the underlying graph structure 
is evolving continuously in time and updates are applied 
continuously (cf. |15)). 

In Section [2] we describe the distributed architecture that 
is considered in this paper. Section|3]presents the evaluation 
analysis based on synthetic data and dataset of web graph. 

2. DISTRIBUTED ARCHITECTURE 

In this paper, we will evaluate the performance of the 
proposed distributed algorithm focusing to the eigenvector 
problem associated to PageRank type equation. However, 



the algorithm is described here in a more general case. We 
assume given a square matrix P of size N x N and an initial 
condition B (a vector of size A''). The D-iteration applied 
on {P, B) solves X (a vector of size A'') satisfying: 

X = P.X + B. 

The approach proposed here should work as soon as the 
spectral radius of P is strictly less than 1 (this is what we 
could call a diagonally dominant system that was mentioned 
in the introduction). In particular, the entries of P or B 
may be positive or negative (cf. [IS]). However, for a bet- 
ter intuitive understanding, we chose here to focus on the 
case where all entries of P are non-negative and implicitly 
associated to a transition matrix. 

2.1 D-iteration: diffusion approach 

We recall that the D-iteration is based on the fluid dif- 
fusion approach where one step of the iteration consists in 
choosing a node i„ (n-th step) and diffusing all fluid at node 
i„ to its children nodes (non zero entries of the i„-th column 
of P): at each step of the iteration, we keep two state vec- 
tors: the current residual/transient fluids are described by 
the vector F„ and the history (counting the amount of dif- 
fused fluid by each node) of the fluid diffusion by the vector 

Hn- 

Below, an adaptation of the pseudo-code in [14] for the 
general case: 



Initialization: 
For i=l. .N: 
H[i] := 0; 
F[i] := B_i; 



// History (counter) 
// Fluid 



Iteration: 

While ( r > Target_Error ) 

Choose i; // node selection 

sent := F[i] ; 

H[i] += sent; 

F[i] := 0; 

For (j such that p(j,i) != 0): 
F[j] += sent * p(j ,i) ; 

r := IFI = sum_j |F[j] I; 

When the above scheme converges (DD system) , we have 
asymptotically (when Tcirget_Error -^ zero) X = H. 

2.2 Distributed algorithm 

We assume that the set Q. = {1, .., A'^} is partitioned in K 
sets Q,k, k = 1,..,K (static or dynamic, see Section [23]). We 
set L the number of non zero entries of the matrix P (total 
number of links) . 

2.2. 1 Local information and diffusion 

We distribute the computation tasks of the D-itcration 
scheme between K virtual machines (we will call PIDs) as 
follows (cf. [14]): 

• each PIDk keeps information on: 

— the set of nodes it is responsible for: fit; 

— the extracted matrix Ck{P) — (pij)ieQ,jeQki the 
column vectors of P corresponding to Qk ; 

— the marginal fluid vector lF]k = {{F)i)i G Qk', 

— the marginal history vector [H]k = {{H)i)i G flk; 



— the previous history vector [Hoid]k = {{H)i)i € Qk, 
the history vector value at the moment of the last 
fluid transmission (to other PIDs); 

— its activity state: active or idle state; 

— the target error value: target.error, 

• each PIDk maintains two local variables (evaluated 
periodically): 

— the local residual fluid: rk = |[P]fe|; 

— the fluid to be transmitted: Sk = \Ck{P){[H]k — 
[Hoid]k)\; 

• each PIDk applies the local diffusion algorithm (*) 
below (when not in idle state); 

• activity state: 

— initialized to active; 

— PIDk's state is set to idle when 

rk < max{sk/10.0,targetjerror x e/K/W), 

where e is a factor depending on P: for PageRank 
equation, e — 1 — damping _ factor; 

• each PIDk select the node to be diffused by a cyclic 
check-up of elements of i}k of the condition: 

(F). X w, > Tk, 

where Tk is a threshold value initialized to an arbitrary 
value larger than maxigOj, {F)i x Wi and Wi the weight 
we associate to the node i; the greedy approach would 
set Wi — 1; other candidates are: Wi — l/(#oufi) or 
Wi = l/{fj^outi X ^irii), where ij^outi and #mi are 
respectively the number of of the outgoing links from 
(number of non zero entries of i-th column of P) and 
the incoming links to node i (number of non zero en- 
tries of i-th line of P). By default, we choose in this 
paper Wi = l/{^outi). When for all i, the condition 
{F)i X Wi > Tk is not satisfied, we apply: Tk := Tk/'y 
(by defauh, 7 = 1.2). 

Local diffusion for PID(k) : (*) 
Choose i in Dmega_k; 
sent := F [i] ; 
H[i] += sent; 
F[i] := 0; 

For ( j in Omega_k such that p(j,i) != ): 
F[j] += sent * p(j,i); 

2.2.2 Fluid exchange 

The transmission of fiuid from PIDk to other PIDs is 
done when: 



Sk > rk/2. 



(1) 



The idea is just to anticipate a bit the moment when Sk and 
rk becomes equal. The PIDs (PIDk') receiving received = 
|[Cfc(P)([J/]fc-[Jfo,dlfc)]fc'| fiuids reinitialize Tfc/ to min(rfc, x 
{rk' + received) /ry , received). 



2.3 PID modelling 

As in [14], we consider a time stepped approximation for 
the simulation of the distributed computation cost (for now 
running on a single PC): during each time step, each PID 
can execute PID^Speedk operations. By default, we set: 
PID.Speedk = PID. Speed = N/K (by default, PIDs are 
assumed to compute at the same speed). 

When a PID is active, it increments count_active_k each 
time an elementary operation (a diffusion from one node to 
another node in the same Q.k set, which roughly corresponds 
to a product of one entry {F)j with one entry of the matrix 
{P)ij and the addition of the product to {F)i) is done. 

Every time step, we set a local counter that counts the 
number of elementary operations that are not consumed (be- 
cause entering in the idle state). When a PID is idle, the 
wasted operations are then added to count_idle_k. 

In the following, the number of iterations is defined as the 
normalized quantity: 

countjcbctiveJi + count-idleJi 



so that it can be easily compared to the cost of one matrix- 
vector product, or one iteration in power iteration. 

2.4 Computation cost 

The computation effort of PIDk is indirectly estimated 
through count_active_k. This counter is incremented: 

• by one each time there is a local diffusion from one 
node to another; 

• by one to the receiver for each diffusion to one node 
(during fluid exchange) managed by the receiver; for 
the sender, we increment by one for each diffusion com- 
ing from Ck{P){[H\k — [Hoid]k)'- this is the quantity 
that was underestimated in |14j : 

• by the number of nodes re-affected for the partition 
set adaptation. 



where e = target-error /K/1000 is added to avoid undefined 
value of slopeji. The quantity —slopejz measures the mov- 
ing averaged value of the exponent (base 10) oirk + Sk'- if we 
plot the curve rk + Sk as a function of the number of iteration 
(normalized) in logscale on y-axis, the exponent represents 
the slope of the curve. By default, we used rj = 0.5. 

Then, every time step, we compute k which maximize and 
minimize slope_k (resp. imax, *min). If the difference is more 
than 50%: 



if [slopejmin < slopejmax + log(0.5)/log(10.0)) 



then, we reaffect: 



in,. 



/ slopejmin + 1 
\ slope 



.max + 1 



,0.1 



2.5 Partition sets 



nodes from i^i^^^ to fli„^^^ (imin identifies the slowest PID). 

To minimize the oscillation behaviour, the sets that are 
just re-affected (decreased or increased) can not be re-affected 
during the next Z steps (by default Z = 10). 

When the Qk set is re-affected, we increment its opera- 
tion cost counter count_active_k (by the number of nodes 
modified for f2i„,;„ and ^i^^^^)- 

3. EXPERIMENTS AND EVALUATION 
3.1 Synthetic data 

We first used a synthetic data generated as follows: as- 
suming a power-law 1/k" {a = 1.5 used here) for the in- 
degree and the out-degree distribution, we generated ran- 
dom links between pair of nodes (see [17] for more details). 

3.1.1 Analysis of K = 2.- iV = 1000 

Let us start with 2 PIDs case for an easier illustration of 
the problem. Figure [l] shows the plots of the convergence 
speed (given by the ratio of rk + Sk and the number of it- 
erations) in logscale on y-axis, when starting with a static 
partition sets of 250 + 750, 500 + 500 and 750 + 250 (in this 
case, L = 9543). 



2. 5. 1 Static partition sets 

As in [14], we consider two simple K partition sets for 
comparison purpose: 

• Uniform partition: f^i = {1,2, ...,N/K}, 0.2 = {N/K+ 
1,2,. ..,2 X N/K}, etc 

• Cost Balanced (CB) partition: Ok = {uik,iOk + ^, ■■■,iOk+i- 
1} such that TrrXl\:\#out,,) = L/K, 

such that {1, ..., A'^} = Q, = VJkOk- The intuition of the cost 
balanced partition is that when we apply the diffusion iter- 
ation on all nodes of each Q,k, the diffusion cost is constant. 
The main reason why we chose this is the simplicity of its 
computation [14| . 

2.5.2 Dynamic partition sets 

In the initial state, we start with the uniform or CB par- 
tition sets. Then, we update the following quantity every 
time step {PIDSpeed operations in active or idle state): 

slope Ji : — 

slopeji X {1 — rj) — log{rk + Sk + s)/log{10.0) x ?; 



0.1 


^ 
^*^- \.^ 




Single PID 

2 PIDs: 250+750: PID1 

2 PIDs: 250+750: PID2 - x --- 

2 PIDs: 500+500: PID1 + 


'X"'^ 


\,^ 


2 PIDs: 750+250: PID1 

2 PIDs: 750+250: PID2 o 




+ Vy 








^. 


< ■ ^V^ ■ 




'^ x^ : 




13 ' A 


1e-07 


iSooooo 



Mb of ilerations 



Figure 1: Illustration of convergence speed: fluid 
exchange cost neglected. 

When the Slfe is not set correctly (too big or too small), 
the gain of the parallelism is reduced. We remark that when 
the fluid exchange cost is neglected, K = 2 (500 -I- 500 case) 



can improve by factor above 2. Figure [2] shows the plots of 
the convergence speed integrating the fluid exchange cost: 
the relative gain to the single PID case is much less im- 
portant than previous results, illustrating the importance of 
this factor even when K is small. 
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Figure 2: Illustration of convergence speed: fluid 
exchange cost integrated. 

Figure [3] illustrates the impact of the partition adaptation 
on the convergence speeds that are made closer: in this case, 
we took initial partition sets of 750-250 and let the system 
self adapts. 
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Figure 4: Illustration of the evolution of the dy- 
namic partition. 
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Figure 3: Illustration of the impact of the dynamic 
partition. 

Figure|4]illustrates the evolution of the partition sets when 
starting from 750-250 (here, we took Z = 1 for a quicker 
adaptation). 

Table [1] gives a comparative computation time (number 
of iterations of the slowest PID) of different approaches for 
K = 1 to 128 {Z — 10): by construction, this can be consid- 
ered as the most favourable situation for the uniform par- 
tition (links are independently and identically distributed 
to all nodes). We see that the dynamic strategy can still 
improve in almost all situations (but not too much). 

To further illustrate the advantage of the dynamic adap- 
tation, we biased the nodes ordering replacing the complete 
random one (previous) by the number of outgoing links (cf. 





From Unif. partition 


From CB partition 


K 


Static Dynamic 


Static Dynamic 


1 


2.39 2.39 


2.39 2.39 


2 


1.39 1.25 


1.31 1.38 


4 


0.85 0.85 


0.81 0.80 


8 


0.56 0.53 


0.49 0.47 


16 


0.37 0.35 


0.43 0.38 


32 


0.29 0.27 


0.31 0.26 


64 


0.26 0.22 


0.30 0.24 


128 


0.26 0.26 


0.35 0.29 



Table 1: Illustration of the computation time for a 
target error of 1/N: N = 1000. 





From Unif. partition 


From CB partition 


K 


Static 


Dynamic 


Static Dynamic 


1 


3.79 


3.79 


3.79 3.79 


2 


3.07 


2.96 


2.83 2.33 


4 


2.48 


2.16 


3.42 2.68 
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1.97 


1.53 


5.09 2.63 


16 


1.57 


1.02 


6.01 2.40 



Table 2: Illustration of the computation time for a 
target error of 1/N: N — 1000. Nodes are ordered by 
the number of outgoing links. 





From Unif. partition 


From CB partition 


K 


Static Dynamic 


Static Dynamic 


1 
2 
4 
8 
16 


4.96 4.96 
3.65 3.48 

2.97 2.03 
2.93 1.69 
3.14 1.35 


4.96 4.96 
3.55 3.02 
2.57 1.91 
2.48 1.62 
2.28 1.25 



Table 3: Illustration of the computation time for a 
target error of 1/A'^: A'^ — 1000. Nodes are ordered by 
the number of incoming links. 



Table [2|: we see that the CB static strategy is not good and 
when K > 4 its performance is even degraded. 

The results of the case when the nodes are ordered by the 
number of incoming links are shown in Table [3] here the 
uniform partition is the worst one. 

Globally, what we observe is that when N/K becomes too 
small, the gain is limited or the performance may be even 
degraded due to the fluid exchange cost. Finally, we observe 
a very good stability/performance of the dynamic partition 
strategy in all situations. 

3.2 Web graph datasets 

For the evaluation purpose, we experimented the dynamic 
partition strategy on a web graph imported from the dataset 
uk-2007-05(§1000000 (available on [J) which has 41,247,159 
links on 1,000,000 nodes (45,766 dangling nodes). 

Below we vary N from 1,000 to 100,000 extracting from 
the dataset the information on the first A^ nodes. 

Figure[5]shows the summarized results on the convergence 
speeds (normalized to the convergence cost for K = 1) for 
iV = 1000, 10000, 100000 starting from the uniform partition 
(unfortunately, we could not yet handle N = 1000000 case 
because of the memory limitation on a single PC). We clearly 
see that because of the fluid exchange cost, the convergence 
becomes slower when K is too large compared to A'^ and that 
for larger N the optimal K value is larger. 

Figure[H]shows the summarized results on the convergence 
speeds starting from the CB partition. Figure [5] and Figure 
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L (nb links) L/N D (Nb dangling nodes) 



12,935 
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3,141,476 



12.9 
12.5 
31.4 
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80 (0.8%) 

2729 (2.7%) 
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Figure 5: Convergence speed-up factor: starting 
from uniform partition. 



[S] correct results reported in [TJ] (fluid exchange cost was 
underestimated) when N/K is small. However, the main 
conjecture/result which states that, when the computation 
is distributed over K virtual machines, the computation 
speed increases almost linearly with K with a slope becom- 
ing closer to one when the number A^ of linear equations to 
be solved increases is still true: we conjecture that the slope 
goes to one asymptotically for large N/K and this is very 
clearly visible in the curves of the dynamic partition based 
approaches (Unif+DYN or CB+DYN) when A'^ is increased. 
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Figure 6: Convergence speed-up factor: starting 
from CB partition. 

Figure [7] shows the consequence on the proportion of the 
PIDs' idle state 

X]fe countJidleJk 



Table 4: Extracted graph: A^ = 1000 to 100000. 



'^^{countjactiveji + count JdleJi) 

when different approaches are applied (TV = 10000). We see 
a clear reduction of the idle state with the dynamic strategy 
when the fluid exchange is not dominant. 

Figure [8] shows the typical result of two different conver- 
gence speeds: in this case PID2 is the slowest one. The 
fluid exchange is done every about 1.2 iterations which is 
clearly visible here. We can see that PIDl can enter in the 



CB+DYN D 




Figure 7: Proportion of the idle state: A^ — 10000. 



idle state because it is waiting for inputs from PID2 (for x 
between 4 and 5, between 6.5 and 7.5 etc) when it reaches 
the target value max(sfe/10.0, iargei_error x e/K/10): this 
is globally not optimal in terms of the PID's computation 
capacity utilization. 
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Figure 8: Evolution of convergence: K — 2, N — 
100000 with static uniform partition. 

Figure |9] shows the impact of the dynamic partition start- 
ing from the uniform partition for the same case than Figure 
|8] The corresponding evolution of the partition sets is shown 
in Figure [10] 

Figure [TT] shows the evolution of the convergence of PIDs 
with static CB partition: because CB is based on a heuristic 
simplification, it does not guarantee the same computation 
efi'ort for the two PIDs. 

Figure [12] shows the evolution of the convergence with 
K = 4 with the static uniform partition, the static CB par- 
tition and the dynamic partition starting from the uniform 
and CB partitions: in this case, the benefit of the dynamic 
partition is very clear with an acceleration by a factor above 
3. 

Figure [13] shows the evolution of the convergence with 
K = 8 with the static uniform partition, the static CB par- 
tition and the dynamic partition starting from the uniform 
partition: in this case, the speed-up factor with the dynamic 
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Figure 9: Evolution of convergence: K — 2, N = 
100000. Dynamic partition from the uniform parti- 
tion. 
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Figure 10: Evolution of partition sets: K — 2, N 
100000. 
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Figure 11: Evolution of convergence: K — 2, N 
100000 with static CB partition. 
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Figure 12: Evolution of convergence: K = 4, N — 
100000. Comparison of static unif., static CB and 
dynamic from unif. and CB. 
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Figure 14: Evolution of convergence: K = 128, A^ 
100000 virith static uniform partition. 



strategy is above 2. 



static Uniform partition 

Static CB partition 

Dynamic partition from Unif 




Mb of iterations 



Figure 13: Evolution of convergence: K = 8, N = 
100000. Comparison of static unif., static CB and 
dynamic from unif. 

Figure [TJ] shows the result of 128 different convergence 
speeds with K = 128: in this case, we can identify 2 slowest 
PIDs. The computation capacities of 126 other PIDs are 
likely to be wasted. 

FigurefTSlshows the impact of the dynamic partition start- 
ing from the uniform partition for the same case than Figure 
1141 In this case, the speed-up factor is about 4 thanks to a 
better computation effort redistribution between PIDs. 

Figure \W\ and Figure [T71 show the global convergence (an 
upper bound on the Li norm to the distance) for different 
approaches (A'^ = 10000): the benefit of the dynamic adap- 
tation is more visible for 7^ > 8. Note that those curves 
must be strictly decreasing function: we observe here some 
local fluctuation due to the artefact of the time stepped ap- 
proximation (linked to the fluid exchange cost): when the 
fluid exchange cost becomes important, the concerned PID 
is likely to be frozen during that time. 

Figure [181 and Figure [191 show the global convergence for 
A'^ = 100000: when K and A^ are larger, the analysis be- 




0.15 
Nb of iterations 



Figure 15: Evolution of convergence: K = 128, 
N — 100000. Dynamic partition from the uniform 
partition. 
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Figure 16: Global convergence: A^ = 10000. For K 
2,4,8. 
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Figure 17: Global convergence: A^ = 10000. For K 
16,32,64. 
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Figure 19: Global convergence: A^ = 100000. For 
K = 128,256,512. 



comes much more complex: we can observe significant and 
sudden slope modification during the iteration. See for in- 
stance Unif+DYN or CB curves for K — 512 in Figure \W\ 
One of very visible effect is the impact of the fiuid exchange 
cost which is increased for larger value of K. 
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Figure 18: Global convergence: 

K = 16, 32, 64. 



N = 100000. For 



The above results show in particular (and this is not sur- 
prising) that it is not necessarily better to increase K and 
an optimal K need to be applied (for a given vector size 
N). This may suggest the possibility of considering a further 
adaptive scheme where we could also dynamically adjust the 
number of PIDs: we hope to address this issue in a future 
work. What we propose here is a first simple candidate to 
highlight the potential of the approach. From this first step, 
one may explore a lot of variants (for instance, we should 
favour partition sets such that there are more links inside 
the Jlfc sets; we could also define the number of nodes to 
be re-affected, when modification required, based on its CB 
evaluation, etc). 

4. CONCLUSION 

In this paper, we presented an adaptive dynamic partition 
strategy applied to a distributed computation architecture of 



the D-iteration method. Through experiments on synthetic 
data and real dataset, we showed that a dynamic partition 
strategy brings a robustness and a better efficiency guar- 
antee compared to the static partition strategy, especially 
when A'' is large. We believe that, even though this is pre- 
liminary results that need to be confirmed by a real deploy- 
ment of a distributed system with possibly further adapta- 
tion/modification of the algorithm design, we showed here 
the potential of a new promising distributed computation 
architecture to solve a very large diagonal dominant class of 
linear systems. 
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